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O ■ ABSTRACT 

^ ' We investigate galaxy clustering and the correlations between galaxies and mass in 

the ACDM cosmological model (inflationary cold dark matter with Vim = 0.4, J7a = 
^ ■ 0.6, h = 0.65, n = 0.95, as = 0.8), using a large, smoothed particle hydrodynamics 

. simulation (SPH, with 2 x 144^ particles in a 50/i~^Mpc cube). Simulated galaxies 

can be unambiguously identified as clumps of stars and cold gas a few kpc to a few 
tens of kpc across, residing in extended halos of hot gas and dark matter; the space 
density of the resolved galaxy population at z = corresponds to that of observed 
galaxies with luminosity L > -L*/4. We investigate the galaxy correlation function, 
the pairwise velocity dispersion and mean pairwise velocity, and the second and third 
moments of counts-in-cells; we also investigate the galaxy-mass correlation function 
and the average extended mass distributions around galaxies, both of which can be 
measured via galaxy-galaxy lensing. For the most part, the predicted biases between 
galaxies and dark matter lead to good agreement with current observations, including: 
(1) a nearly constant comoving correlation length from z = 3 to 2 = for mass-selected 
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galaxy samples of constant comoving space density; (2) an rms bias factor feg- 1 at 
z = 0; (3) a scale-dependent bias on small scales that transforms the curved dark matter 
correlation function into a nearly power-law galaxy correlation function; (4) galaxy 
pairwise dispersion and hierarchical skewness ratio 5*3 in good agreement with observed 
values, and lower than values for the dark matter by ~ 20%; (5) a ratio of galaxy-galaxy 
to galaxy-mass correlation functions consistent with recent measurements from the Red 
Cluster Sequence survey; and (6) a mean excess mass AM(260^~^kpc) approximately 
proportional to galaxy baryon mass M^, in agreement with estimates from the Sloan 
Digital Sky Survey (SDSS). All of these clustering properties vary with galaxy baryon 
mass and, more strongly, with the age of a galaxy's stellar population. The predicted 
dependences are in good qualitative agreement with the observed dependence of galaxy 
clustering and the galaxy-mass correlation function on galaxy type. The predicted ratio 
AM(260/?,^"^kpc)/M5 is lower than the SDSS estimates by a factor of ~ 1.5 — 2 for 
galaxies with > 2 x 10"*^^ Mq. A test with a higher resolution (smaller volume) 
simulation suggests that this discrepancy is mostly a numerical artifact; if so, then the 
SDSS weak lensing comparison leaves little room for feedback or other astrophysical 
processes to reduce the stellar masses of luminous galaxies, at least given our adopted 
cosmological parameters. On the whole, our results show that the ACDM model and 
the galaxy formation physics incorporated in the SPH simulation give a good account of 
observed galaxy clustering, but anticipated improvements in clustering and weak lensing 
measurements will soon test this picture in much greater detail. 

Subject headings: galaxies: formation — large-scale structure of universe 

1. Introduction 

The clustering of galaxies has long been an essential testing ground for cosmological models and 
the theory of galaxy formation, with comparisons between predicted and observed clustering driving 
much of the progress in cosmology during the 1970s and 1980s. Such comparisons provided early 
support for gravitational instability as the central mechanism of structure formation (e.g., Davis, 
Groth, &; Peebles 1977), for initial conditions with a power spectrum redder than white noise (e.g., 
Gott Sz Rees 1975; Gott & Turner 1977), for a "bottom up" rather than "top down" sequence of 
gravitational clustering (e.g., White, Prenk, &; Davis 1983; Davis et al. 1985; Pry & Melott 1985), for 
an approximately Gaussian distribution of primordial fluctuations (e.g., Gott et al. 1989; Weinberg 
& Cole 1992; Bouchet et al. 1993), and for a power spectrum significantly redder than that predicted 
by the "standard" (17^ = I, h = Ho/100 knis^-*^ Mpc~^ = 0.5) cold dark matter scenario (e.g., 
Efstathiou et al. 1990; Maddox et al. 1990; Park et al. 1994). The growth of galaxy redshift surveys 
has led to measurements of increasing precision and detail, and cosmological N-body simulations 
have developed into a powerful tool for calculating the gravitational clustering of collisionless dark 
matter from specified initial conditions. The main obstacle to drawing stronger inferences from the 
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data is the dependence of theoretical predictions on both the relatively straightforward physics of 

gravitational clustering, which largely determines the distribution of dark matter, and the more 
complex physics of galaxy formation, which determines the relation between galaxies and mass, 
often referred to generically as "bias." 

In this paper, we study galaxy clustering and galaxy bias in a large (2 x 144^ particles, 
50/i~^Mpc cube), smoothed particle hydrodynamics (SPH) simulation of a low density, inflationary 
cold dark matter universe with a cosmological constant (ACDM). Our goals are, first, to see whether 
the currently leading cosmological model and "standard" galaxy formation physics as incarnated 
in our SPH simulation can reproduce existing observations, and, second, to give guidance for the 
physical interpretation of those observations and for the design of more demanding tests of the 
theoretical predictions. The N-body+hydro approach to ab initio predictions of galaxy bias has a 
substantial history, beginning with dissipative "sticky particle" simulations by Carlberg & Couch- 
man (1989) and continuing in the early 1990s with simulations using SPH (Katz, Hernquist, & 
Weinberg 1992; Evrard, Summers, &; Davis 1994) or Eulerian grid hydrodynamics (Cen &; Ostriker 
1992) to model galaxy formation in a representative (but small) cosmological volume. The present 
paper is a direct descendant of Katz et al. (1992), but we study a different cosmological model, we 
incorporate improvements to the physical treatment of radiative cooling and star formation (Katz, 
Weinberg &; Hernquist 1996, hereafter KWH), and, above all, we take advantage of the parallel 
implementation of TreeSPH (Dave, Dubinski, Sz Hernquist 1997) and advances in computer tech- 
nology to simulate a volume more than 90 times larger than that of Katz et al. (1992), at similar 
resolution. Our analysis overlaps significantly with other recent studies using large volume SPH 
(Pearce et al. 1999, 2001; Yoshikawa et al. 2001) or Eulerian hydrodynamics (Cen &: Ostriker 2000) 
simulations. Relative to these investigations, we have higher mass and/or spatial resolution and a 
somewhat smaller simulation volume, as discussed in §2 below. 

Two other approaches to ab initio predictions of bias have gained currency in recent years: high- 
resolution, coUisionless N-body simulations that identify galaxies with "subhalos" in the dark matter 
distribution (e.g., Colm et al. 1999; Kravtsov & Klypin 1999), and a hybrid method that combines 
N-body simulations of the dark matter component with semi-analytic treatments of the galaxy 
formation physics (e.g., Kauffmann, Nusser, & Stcinnietz 1997; Governato et al. 1998; Kauffman 
et al. 1999a,b; Benson et al. 2000a,b, 2001; Hatton et al. 2002). Wc will discuss comparisons of 
our results to those from other hydrodynamic simulations and from the high-resolution N-body 
and hybrid approaches as they arise. We carry out a detailed comparison between our simulation 
and the semi-analytic model of Benson et al. (2000a) in a separate paper that focuses on the "halo 
occupation distribution" (HOD) predicted by the two methods (Berlind et al. 2002). The HOD 
description can be used to calculate many different galaxy clustering statistics, and it helps explain 
the origin of bias in a physically intuitive manner (see Berlind & Weinberg 2002 and references 
therein). Based on this comparison, wc conclude that our SPH approach and Benson et al.'s 
(2000a) semi-analytic method should yield similar predictions for most galaxy clustering statistics. 

Here we focus mainly on the "classic" measurements of galaxy clustering — the two-point 
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correlation function, variance of galaxy counts, and pairwise velocity moments — and on one of the 
simplest measures of higher-order clustering, the third moment of counts-in-cells. Large redshift 
surveys like the 2dF Galaxy Redshift Survey (2dFGRS) and the Sloan Digital Sky Survey (SDSS) 
now allow precise measurements of these quantities for multiple classes of galaxies, defined by 
luminosity, color, morphology, or spectral type (sec, e.g., Norberg et al. 2001, 2002b; Zehavi et al. 
2002, and references therein). While we cannot model these variations with galaxy type in detail 
— our simulation volume is far smaller than these surveys, and we do not resolve the morphology 
of the simulated galaxies — we can examine the predicted trends of clustering with baryon mass, 
stellar population age, and local environment. We concentrate on present-day clustering, but we 
also compute the evolution of the two-point function, which can be compared to results from deep 
redshift surveys and to studies of Lyman-break galaxies at z 3 (e.g., Adelberger et al. 1998, 
2002) . The discovery that the clustering of Lyman-break galaxies is similar to that of galaxies 
at z = 0, despite the weaker expected clustering of the underlying mass distribution, provides 
strong evidence that the bright galaxy population was highly biased at z ^ 3, even if galaxies 
roughly trace mass today. 

Galaxy-galaxy weak lensing is an important new observational probe of the relation between 
galaxies and dark matter, measuring the galaxy-mass cross-correlation function and the extended 
mass distributions around galaxies of different types and in different environments. We will devote 
considerable attention to modeling recent observations of this phenomenon. The closest similar 
efforts are those of White, Hernquist, &; Springel (2001), who present predictions from a hydrody- 
namic simulation at z = 1 and z = 0.5, and Guzik &: Seljak (2001) and Yang et al. (2002), who 
analyze the N-body-|-semi-analytic galaxy distribution of Kauffman et al. (1999a). 

One strength of the hydrodynamic simulation approach is that it predicts properties of the 

intergalactic medium (IGM) in addition to the galaxy population. We have used the simulation 
analyzed here to model the phase distribution of baryons in the present day universe (Dave et al. 
2001), the X-ray background (Croft et al. 2001), X-ray emission from galaxy groups (Dave, Katz, 
&; Weinberg 2002), and X-ray absorption by the diffuse IGM (Chen et al. 2002). In combination 
with other simulations, we have used it to study cooling radiation and sub-millimeter emission 
from young galaxies (Fardal et al. 2001, 2002), the relative importance of mergers and smooth 
accretion in galaxy assembly (Murali et al. 2002), and the correlations between galaxies and the 
Lya forest at high redshift (Kollmeier et al. 2002). The observed distributions and correlations of 
galaxy properties, in particular the luminosity function and the Tully-Fisher (1977) relation, are 
also essential tests of the galaxy formation model, and any discrepancies with these observations 
can provide insight into the model's failings. We will consider these characteristics of the galaxy 
population in a separate paper (Katz et al., in preparation), though the comparison to observed 
galaxy-galaxy lensing measurements will bring up some of the same issues here. We proceed to a 
description of our numerical methods in §2, to predictions of galaxy clustering in §3, and to the 
galaxy-mass correlation function and comparison to lensing measurements in §4. We recap our 
findings and discuss their implications in §5. 
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2. Simulating Galaxy Formation 
2.1. Simulation 

We employ a hydrodynamic simulation of a ACDM model with Q^n = 0-4, = 0.6, h = 0.65, 
Q,b = 0.02/i^^ = 0.0473, inflationary spectral index n = 0.95, and a power spectrum normalization 
o"8 = 0.80. This model is consistent with most available observational constraints (Bahcall et al. 
1999; Jaffe et al. 2001), including COBE normalization (Bennett et al. 1996), cluster masses (Eke, 
Cole, Sz Erenk 1996), cosmic shear (Hoekstra, Yee, & Gladders 2002, and references therein), the 
deuterium abundance in high-redshift Lyman limit systems (Tytler, Fan, &; Buries 1996; Buries & 
Tytler 1997, 1998), the Hubble diagram of Type la supernovae (Riess ct al. 1998; Pcrlmutter et 
al. 1999), and the flux power spectrum of the Lya forest (Croft et al. 1999, 2002; McDonald et al. 
2000). 

Our initial conditions represent a random 50/i~^Mpc (comoving) cube, realized with 144^ gas 
and 144^ dark matter particles in a triply periodic volume, yielding particle masses of mgas = 
8.5 X W^Mq and mdark = 6.3 x 10^ M©, respectively. These initial conditions are evolved from 
redshift 2; = 49 ^ using Parallel TreeSPH (Dave, Dubinski, & Hernquist 1997). We soften the 
gravitational force via a 10^~^kpc (comoving) spline kernel, equivalent to a Plummer softening of 
7/i~^kpc. For particle timesteps we use the criteria described by KWH and Quinn et al. (1997), 
setting the tolerance parameter t] (defined in these papers) to 0.4. The simulation was evolved to 
z = 0.5 on the Cray T3E at SDSC and finished on the SGI Origin 2000 at NCSA. 

We include radiative cooling for primordial composition gas as described by KWH. The tests 
in Weinberg et al. (1997) show that photoheating by an ionizing background artificially suppresses 
the formation of low mass galaxies in a simulation of this resolution, but that a simulation evolved 
without an ionizing background produces much the same galaxy population (above its resolution 
limit) as a higher resolution simulation with an ionizing background. Though Weinberg et al. (1997) 
considered an fi^ = 1 CDM model rather than ACDM, wc expect their conclusions to carry over 
to this model, and we therefore evolve the simulation with no ionizing background. 

We incorporate star formation using the algorithm described and tested by KWH. This algo- 
rithm converts cold, dense gas into collisionless stars at a rate governed by the local dynamical and 
cooling timcscalcs, returning supernova feedback as thermal energy to the surrounding medium. 
The tests in KWH and Weinberg et al. (1997) show that the resulting galaxy population is insen- 
sitive to the one free parameter of the algorithm and is similar to that identified using cold gas in 
a simulation without star formation; it is gravity, gas dynamics, and radiative cooling that deter- 
mine where galaxies form and how massive they will be. A radical change to the star formation 
or feedback algorithm might alter the galaxy population in a significant way, but provided that it 
gave the same relative masses and ages of the galaxies (more precisely, that it preserved their rank 
order), it would not affect the clustering results presented here, except for the absolute values of 
mass-to-light ratios discussed in §4. 
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Our numerical approach is similar to those used by Pearce et al. (1999, 2001) and Yoshikawa 
et al. (2001), but we have opted for a somewhat smaller volume and larger particle number, and 
consequently higher resolution. Pearce et al. use 128'^ particles in a 70/i"iMpc cube, so our mass 
resolution is higher by (70/50)^(144/128)^ = 3.9 (though the ratio of SPH particle masses is only 
2.9 because Pearce et al. use a lower value of O^). The gravitational softening of the Pearce 
et al. simulation is similar to ours at z = (10/i~^kpc vs. our 7/t~^kpc), but larger at higher 
redshift, since they keep the softening fixed in physical coordinates at z < 2.5. The mass resolution 
of the Yoshikawa et al. (2001) simulation is similar to that of Pearce et al. (128^ particles in a 
75/i~^Mpc cube), and the (Plummer equivalent) gravitational softening is 41/i~^kpc, constant in 
comoving coordinates. The most important methodological difference is that we convert cold gas 
into collisionless stars, while Pearce et al. and Yoshikawa et al. "decouple" cold gas from hot gas 
when computing SPH densities but leave the cold gas subject to SPH forces and dissipation. Despite 
the differences in numerical resolution, input physics, and implementation details, we find quite 
good agreement with these previous investigations where our results overlap, as discussed in §3. 
The cosmological parameters of the three ACDM simulations are similar but not identical. (Pearce 
et al. investigate an Q„i = 1 model in addition to ACDM.) 

Cen & Ostriker (2000) have investigated galaxy clustering and bias in the ACDM model with 
a 512^ Eulerian mesh simulation of a 100/i.~^Mpc volume. The initial mass resolution of this 
simulation is higher than ours, but the force resolution is much lower, with individual mesh cells 
200^"^ kpc. Because the simulation does not track individual galaxies within high density regions, 
Cen & Ostriker examine stellar mass weighted clustering statistics rather than the galaxy number 
weighted statistics that we investigate here. Nonetheless, our conclusions about galaxy bias on 
the scales r > l/i~^Mpc resolved by the Cen Sz Ostriker simulation are qualitatively similar, as 
discussed in S3. 



2.2. Galaxy Identification 

Figure 1 illustrates the state of the simulation at z = 0, showing dark matter, gas, and star 
particles projected through the entire volume in a quarter of the box, and in expanded views 6, 1.5, 
and 0.375^~^Mpc on a side centered on the most massive system in our volume. The star particles 
are clumped into groups of extremely high overdensity, from a few kpc to a few tens of kpc across. 
Dark halos with the mass scale of galaxy groups and clusters contain many such dense clumps in 
addition to a smooth distribution of hot gas. Figure 1 shows that there is virtually no ambiguity in 
identifying the galaxies in this simulation; one simply needs an automated algorithm that picks out 
the distinct clumps of stars and cold gas. The algorithm that we use is Spline Kernel Interpolative 
DENMAX ^ (SKID), which selects gravitationally bound groups of particles that are associated 
with a common local density maximum (see KWH). SKID selects essentially the same population 



^http : //www-hpcc . astro . Washington . edu/ tools/SKID/ 
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Fig. 1. — Projections through the simulation volume at z = 0. Blue particles are dark matter, red 
are gas, and yellow are baryonic particles where star formation has occurred. The size of each view 
is indicated. The expanded views are centered on the largest galaxy, seen near the left edge of the 
upper left panel. All panels show the projection through the entire simulation volume. 
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of objects that one would pick out by eye from a representation like Figure 1. 

Figure 2 shows the baryonic mass function of the simulated galaxy population at z = 3, 1, 
and 0, where Mi, represents the total mass of a galaxy's stars and associated cold (T < 30, 000 K) 
gas. The number of galaxies steadily increases with time, though the rate of new galaxies forming 
is dramatically slower for z = 1 ^ than for z = 3 ^ 1. We will discuss the z = luminosity 
function and TuUy-Fisher relation in a separate paper (Katz et al., in preparation), and we have 
elsewhere compared predictions of the z = 3 luminosity function from higher resolution simulations 
(of smaller volumes) to measurements of the Lyman-break galaxy population (Weinberg et al. 
1999, 2002). The dashed vertical line in Figure 2 marks the mass of 60 SPH particles, and all 
of the simulated mass functions turn over shortly below this threshold. Our tests using a suite 
of simulations of different resolution show that galaxies below this threshold have underestimated 
masses or may be missed entirely, while the masses and locations of galaxies above this threshold 
are generally robust (see Weinberg et al. 1999). The dotted Une shows the mass function at 2; = 
from a higher resolution (e = 3/t~^kpc, mgas = 1-04 x 10^ M0), smaller volume (22.22/i~^Mpc cube) 
simulation having 2 x 128^ particles and the same cosmological parameters, analyzed in the same 
way; a comparison with the solid line clearly shows that the turnover below the demarcated mass 
limit is due to numerical effects. We therefore adopt a baryon mass of M5 = 60mgas = 5 x 10^^ Mq 
as our completeness limit, and we analyze the clustering only of galaxies above this threshold. 
Note that the mass function comparison also suggests that the lower resolution simulation may 
overestimate the baryon masses of the largest galaxies, a point that we will return to in §4. 

Out of 4632 total SKID-identified galaxies at 2; = 0, 2571 satisfy our > 60mgas completeness 
criterion. At z = 1, 2, 3 and 4, the corresponding fractions are 2011/4012, 1034/2488, 420/1177, and 
104/384. The sample of all galaxies with > 60mgas will be referred to as our "complete sample" 
of galaxies. At 2; = 0, the space density of the complete sample, 2571/(50/i~^Mpc)^ = 0.02/i^Mpc~^, 
corresponds to that of observed galaxies brighter than about L^,/4 (Blanton et al. 2001), where L^. 
is the characteristic luminosity of the Schechter (1976) function fit. 

3. Galaxy Clustering 

Figure 3 illustrates the evolution of structure in a 10/i~^Mpc thick slice of the simulation 
volume, running from z = 3 (top) to z = 1 (middle) to z = (bottom). The left hand panels 
show the dark matter distribution, which exhibits the usual behavior seen in N-body simulations: 
a steady increase of contrast between growing filaments and emptying voids, and fragmentation of 
filaments into increasingly massive clumps. The middle column shows the clustering of a subset of 
galaxies with nearly constant comoving space density, namely the 420 resolved galaxies at z = 3 
and the 500 most massive galaxies at 2; = 1 and z = 0. We will refer to this subset below as our 
"L* sample." The evolution of galaxy clustering is markedly different from that of the dark matter. 
The network of filaments and voids is already present at z = 3, with nearly the full contrast that 
it achieves by z = 0. The principal change in the clustering of this sample is on small scales, with 
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Fig. 2. — The baryonic mass function of SKID-idcntificd galaxies at redshifts z = 0,1,3 (solid, 
dashed, and dot-dashed histograms, respectively) in the simulation. The dotted 

histogram shows the z = mass function from a higher resolution simulation of a smaller volume. 
Solid and dotted histograms diverge below Mj, = 5 x lO^^M© (vertical dashed line), corresponding 
to 60mgas in the large volume simulation, which we adopt as our threshold for identifying resolved 
galaxies. 
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Fig. 3. — Structure in a 10/i~"^Mpc thick slice of the 50/i~"^Mpc cube at z = 3 (top), z = 1 (middle), 
and z = (bottom). The left column shows a randomly selected 5% of the dark matter particles. 
The middle column shows the 420 resolved galaxies at z = 3 and the 500 most massive galaxies at 
z = 1 and z = 0. The right column shows all resolved galaxies at the three redshifts, with the area 
of each point proportional to the galaxy baryon mass and the smallest points corresponding to our 
resolution threshold of 5 x lO^^M©. 
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the collapse of some extended structures into tight clumps. 

The right column of Figure 3 shows the full resolved galaxy population at the three redshifts, 
with a point size that reflects the galaxy baryon mass. Here one can see the increase in the number 
of resolved galaxies (from 420 to 2011 to 2571), the growth in the average mass of these galaxies, 
and the preferential location of the most massive galaxies in dense systems at z = 0. The changing 
number density and size of points makes it difficult to gauge the evolution of clustering strength 
visually, but we will see below that the correlation function of this complete galaxy sample stays 
nearly constant from 2; = 3 to 2; = 0. 



The two-point correlation function is the most well studied measure of galaxy clustering, in 
part because it is relatively straightforward to estimate from angular or redshift survey data, in 
part because it contains complete statistical information on fluctuations in the linear regime if these 
fluctuations are Gaussian, and, perhaps most of all, because its observed form is remarkably simple. 
For most galaxy samples, the correlation function is well described by a power law. 



on scales r ~ 0.01 - Whr^Mpc (Totsuji & Kihara 1969; Peebles 1974; Gott & Turner 1979). 
However, the values of ro and, to a lesser extent, 7, depend on the luminosity, color, and type of 
the sample galaxies (e.g., Guzzo et al. 1997; Norberg et al. 2001, 2002b; Zehavi et al. 2002, and 
numerous references therein). We list determinations of tq and 7 from a number of recent redshift 
surveys in Table 1. While the varying values of ro and 7 partly reflect statistical uncertainties, 
they arise mainly from the different selection criteria of the samples from which they are measured. 
In most cases, the values quoted in Table 1 are derived from flux-limited galaxy samples, and 
the weighting of galaxies of different luminosities depends on the specific procedure adopted for 
estimating ^(r). The quoted values for the 2dFGRS are derived from a volume-limited sample of 
galaxies with absolute magnitudes close to the characteristic magnitude M^^ obtained for 2dFGRS 
galaxies. The quoted values for the SDSS are from a flux-limited sample, but they are similar to 
those obtained for a volume- limited sample with absolute magnitudes close to M*. The difference 
between the 2dFGRS and SDSS correlation lengths, 4.9 ± 0.3/i"^Mpc vs. 6.1 ± 0.2/i"^Mpc (or 
6.3 ± 0.8/i~^Mpc for the volume-limited sample) presumably reflects the difference between blue 
and red selection. 

Early N-body simulations showed that gravitational clustering produces an approximate power- 
law ^(r), especially if the initial power spectrum has a slope n — 1 (Gott, Turner, & Aarseth 1979). 
However, the much higher precision correlation functions calculated with modern simulations show 
significant departures from a power-law ^(r) for CDM or power-law initial spectra (e.g., Jenkins 
et al. 1998 and references therein). The origin of these departures can be understood in analytic 
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Table 1. Real-space correlation function parameters ro and 7 from our simulation and recent 

redshift surveys. 



Survey 


ro 


7 


Reference 


All Simulated 


3.96 ± 0.30 


1.78 ±0.05 




Li, Simulated 


4.52 ± 0.38 


2.00 ± 0.08 




200 Most Massive 


5.20 ±0.52 


2.05 ± 0.09 




Stromlo-APM 


5.1 ±0.2 


1.71 ±0.05 


Loveday et al. (1995) 


CfA2/SSRS2^ 


5.8 


1.8 


Marzke et al. (1995) 


LCRS 


5.06 ±0.12 


1.86 ± 0.034 


Jing, Mo & Borner (1998) 


Durham/UKST 


5.1 ±0.3 


1.6 ±0.1 


Ratcliffe et al. (1998) 


ESP 


4.15 ±0.2 


1.67 ±0.08 


Guzzo et al. (2000) 


2dFGRS^ 


4.9 ±0.3 


1.71 ± 0.06 


Norberg et al. (2001) 


SDSS^ 


6.1 ±0.2 


1.75 ±0.03 


Zchavi ct al. (2002) 


PSCz'i 


3.7 


1.69 


Jing, Borner, & Suto (2002) 



'^Based on figure 3 of the Marzke et al. (1995) analysis of CfA2 and SSRS2. 

^We quote values for the volume-limited sub-sample with —20.5 < M^^ < 
-19.5, close to the value of found by Norberg et al. (2002a) for the 2dFGRS. 

'^We quote values for the full flux-limited sample. For a volume limited sample 
with -21.5 < Mr < -20.0, close to the Blanton et al. (2001) value of for the 
SDSS, Zehavi et al. (2002) find ro = 6.3 ± 0.8 and 7 = 1.80 ± 0.09. 

'^Note that these are IRAS-selected galaxies and therefore preferentially late 
types. 
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models of matter clustering (Hamilton et al. 1991; Peacock Sz Dodds 1996; Ma & Pry 2000; Seljak 
2000; Scoccimarro et al. 2001). Thus, the observed power-law form of ^(r) requires either special 
features in the primordial power spectrum or scale-dependent bias between the galaxy and dark 
matter correlation functions at the present day. 

Figure 4 illustrates the evolution of ^(r) in our simulation from z = 3 to z = 0. The top panel 
shows results for the complete galaxy sample, i.e., all galaxies above our 60mgas = 5 x IO^'^Mq 
resolution threshold at the corresponding redshift, corresponding to the right hand column of 
Pigure 3. As noted in §2.2, there are 420, 1034, 2011, and 2571 galaxies satisfying this criterion at 
z = 3, 2, 1, and 0, respectively. Since the strength of galaxy clustering depends on mass (see §3.2 
below), it is useful to identify samples of specified space density for comparison to observations. 
The middle panel shows results for the 500 most massive galaxies present at each redshift (except 
at z = 3, where it contains only the 420 galaxies above the resolution threshold). The ranking of 
each galaxy is based on its total mass of stars and cold (T < 30, 000 K) gas. The comoving space 
density of this sample, 500/(50^^^Mpc)^ = 0.004^^Mpc~^, is similar to that of L* galaxies today 
(Blanton et al. 2001) and similar to (but slightly higher than) the space density of Lyman-break 
galaxies in the spectroscopic samples of Adelberger et al. (1998). We therefore refer to this as oiu 
"L*-sample," illustrated by the middle column of Figure 3. At z = 0, the baryon mass threshold 
for this sample is = 2.9 x IO^^Mq. The bottom panel of Figure 4 shows the dark matter ^(r). 

In each panel of Figure 4, we show as a reference the power-law fit to the SDSS correlation 
function found by Zehavi et al. (2002). Comparison of our predictions to results from a flux- 
selected observational sample rests on the implicit assumption that the luminosity of a galaxy is 
an approximately monotonic function of its mass, and we use the SDSS result in preference to 
the 2dPGRS result of Norberg et al. (2001) because this assumption seems more applicable to the 
r-band selection used in the SDSS. Lyman-break galaxy samples at z ~ 3 are selected in the rest- 
frame UV, where the instantaneous star formation rate is more important than the stellar mass in 
determining galaxy luminosity; however, in our simulations, these star formation rates are fairly 
well correlated (though not perfectly so) with baryon mass (Weinberg et al. 2002). 

The main source of statistical error in our predictions is the finite number of coherent structures 
present in our (50/i~^Mpc)^ simulation volume. We estimate the error bars on the correlation 
function and on other statistical quantities that we compute later by applying jackknife resampling 
to the eight octants of the simulation cube. Specifically, we create eight jackknife subsamples by 
deleting each of the eight octants of the cube in turn, and we estimate ^(r) for each subsample; to 
avoid boundary effects that would complicate ^(r) estimation and reduce our effective volume, we 
identify pairs using the full periodic cube and assign each pair to an octant based on the position 
of one of its particles. The 1 — a error bar (7, assigned to the value of ^(r) at r = is 




(2) 



where is the estimate in subsample j, ^ is the estimate from the full cube, and N = 8. This 
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Fig. 4. — Correlation functions ^(r) for the complete galaxy sample (top panel), the galaxy 
sample (middle), and the dark matter (bottom), at redshifts z = 0,1,2,3. Jackknife error bars 
are plotted for the z = correlation function. Open circles show a power law with the parameters 
measured from the SDSS by Zehavi et al. (2002), ro = 6.1/t~^Mpc and 7 = 1.75. 
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approach is similar to using the dispersion among the values of ^(r) from the eight octants (in 
which case one would divide by 8^/^ to get the error on the mean), but it is more robust because 
each of the subsamples is closer in size to the full sample. Zehavi et al. (2002) use this method to 
estimate statistical errors on the observed correlation functions of SDSS galaxies. The jackknife 
error estimates automatically incorporate the contribution from galaxy shot noise in addition to 
the finite number of structures, but the latter usually dominates by a large factor over the former 
(which would be represented by "Poisson error bars" ) . 

The striking difference between the evolution of galaxy and dark matter correlation functions 
in Figure 4 is similar to that found in previous studies based on hydrodynamic simulations, semi- 
analytic calculations of galaxy formation, and high-resolution N-body simulations that identify 
galaxies as "sub-halos" within larger virialized objects (Katz, Hernquist, & Weinberg 1999; Colin 
et al. 1999; Kauffman et al. 1999b; Cen k Ostriker 2000; Benson et al. 2001; Pearce et al. 2001; 
Somerville et al. 2001; Yoshikawa et al. 2001). As Figure 3 shows, galaxies (at least those massive 
enough to be resolved by our simulation) form at special locations in the density field, and at high 
redshift they already trace out the network of filaments nascent in the dark matter distribution. 
The dark matter correlation fTinction grows in time as mass moves into this network from the 
surrounding regions, but the structure traced by galaxies stays relatively unchanged, and the galaxy 
correlation function is only weakly dependent on redshift. 

Figure 5 quantifies these points in terms of the parameters tq and 7, which we determine from 
power-law fits to the correlation functions over the full range plotted in Figure 4, 0.02/i-iMpc < 
r < 12^~^Mpc. (Note that our figures plot the point for each radial separation bin at the computed 
mean separation of pairs in that bin.) For some sparse, high-redshift samples, the smallest radial 
bins have no pairs, in which we case we start the fit at slightly larger r. We use the jackknife error 
bars on the ^(r) points to weight the fit and determine its statistical errors. We do not incorporate 
the covariance of the (,{r) errors; in principle, we could estimate the full error covariancc matrix by 
the jackknife method, but in practice such estimates are noisy and not easy to use. Treating the 
^(r) errors as independent causes us to underestimate the statistical uncertainty in tq. The dark 
matter correlation functions are not well described by a power law at 2; < 1, where the fits have 
X^/d.o.f. of ~ 1.5 — 2, so the derived values of tq and 7 are sensitive to the radial range selected 
for the fit. Power-law fits to the galaxy correlation functions are nearly always adequate, and they 
usually have x^/d.o.f. less than one (~ 0.3 — 0.7) because we have ignored the covariance of the 
errors in ^(r). 

Figure 5 shows steady growth of the comoving correlation length of the dark matter, as 
expected. The correlation length of the complete galaxy sample, on the other hand, starts at 
ro = 5.2^~^Mpc at 2; = 4, declines to a minimum of 3.0/t~^Mpc aX z = 1.5, and climbs again 
to ro = 4/i~^Mpc by z = 0. The sample has a higher correlation length than the complete 
sample at z = 0, ro ~ 4.5/i~^Mpc, though the correlation lengths of the two samples converge at 
higher redshift as the threshold approaches our completeness threshold. Correlation lengths for 
a sample with a higher mass threshold, the 200 most massive galaxies at each redshift, are higher 
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Fig. 5. — Evolution of the correlation length ro (in comoving h~^Mpc) and power-law index 7, for 
all galaxies (solid), the 500 most massive galaxies (dashed), the 200 most massive galaxies (dot- 
dashed), and the dark matter (dotted). Error bars are obtained from the power-law fits, using the 
jackknife errors on £,{r). Lines for the 500 largest galaxies stop at z = 2.5, since the complete sample 
contains fewer than 500 galaxies at higher redshift; likewise, lines for the 200 largest galaxies stop 
at z = 3.5. 
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still, To = 5.2/i~ Mpc at 2; = 0, with only a slight decline towards higher redshift. The values of 7 
for the galaxy correlation functions range from 1.75 to 2.05, with a tendency for larger 7 for the 
more massive galaxy samples or at higher redshifts. At high redshift, the dark matter correlation 
function is close to a power law, but a quite shallow one. As the scale of non-linear structure grows, 
it produces a "bulge" in the dark matter ^(r) (see Fig. 4), which drives the correlation function 
away from a power law and steepens the effective value of 7. 

The slow evolution of the galaxy correlation length in Figure 5 agrees with results from the 
KPNO wide-area survey (Postman et al. 1998), which yields ro ~ 4.5^~^Mpc (comoving) at z 0.5, 
and from the CNOC-2 redshift survey, which shows little evolution of the comoving ro out to 
z ^ 1 (Carlberg et al. 1999). However, the value of ro depends significantly on galaxy mass 
and age (see Figure 7 below), so caution is required when interpreting results of surveys that 
may select different types of galaxies at different redshifts. The predicted correlation length at 
2; = 3, ro = 4.2 ± 0.3/i~^Mpc, agrees very well with the value ro = 4.3 ± 0.3/i~^Mpc measured 
from spectroscopic samples of Lyman-break galaxies (Adelberger et al. 2002; we have transformed 
from their = 0.3 cosmology to our = 0.4 cosmology using the ratio of angular diameter 
distances at z = 3), though the slope 7 = 1.90 ± 0.07 is steeper than the observed value of 
7 = 1.55 lb 0.15. Similar values of tq for Lyman-break galaxies have been obtained using semi- 
analytic galaxy formation models (Governato et al. 1998; Kauffman et al. 1999b; Benson et al. 
2001; Somerville et al. 2001) and other hydrodynamic simulations (Cen &; Ostriker 2000; Pearce et 
al. 2001; Yoshikawa et al. 2001). The value of ro from our 50/i~^Mpc simulation is somewhat larger 
than the estimate ro = 3.1/i~^Mpc (including a factor 2.2 correction for finite box size effects) 
derived by Katz, Hernquist, & Weinberg (1999) from a simulation of an 11.111/i^^Mpc cube, but a 
difference of this sort is expected because the mass resolution threshold of our present simulation 
is a factor of eight higher, and correlation strength increases with galaxy mass. 

Perhaps the most significant result of this section is the power-law form of the galaxy correlation 
function at z = 0. The dark matter correlation function shows the curvature found in previous N- 
body studies (e.g., Jenkins et al. 1998), and it is inconsistent with a pure power law. We demonstrate 
this point explicitly in Figure 6, where we plot the ratio of the dark matter, complete galaxy, and L*- 
galaxy correlation functions to a power law (r/4.0/;,^"^Mpc)^"^ '^'' that has the observed logarithmic 
slope. The dark matter correlation function rises above this power law at r 0.5/i^^Mpc and falls 
well below it at r < 0.05/i^^Mpc. (The departure from a power law is somewhat less prominent 
than that in Jenkins et al.'s [1998] ACDM simulation because we adopt as = 0.8 and n = 0.95 
instead of as = 0.9 and n = 1.) The correlation function of the complete sample, on the other 
hand, follows the power law almost perfectly. The situation for the sample is less clear; here 
^(r) shows a rise at r < l/i~^Mpc reminiscent of that shown by the dark matter, but within our 
statistical errors, the ^(r) is adequately described by a power law with 7 = 2.00 ± 0.08. 

The correlation length of the sample, ro = 4.5ib0.4/i~^Mpc, is smaller than the SDSS value 
of ro = 6.1/i~^Mpc, and slightly below the value ro = 4.9/i~^Mpc found for blue-selected galaxies 
in the 2dFGRS. The slope 7 = 2.00 ± 0.08 is steeper than the observed value of 7 « 1.75. The low 
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amplitude appears to be partly an artifact of the particular realization of ACDM initial conditions in 
our 50/i~^Mpc simulation volume. We have run particle-mesh N-body simulations (using the code 
of Park [1990]) of these initial conditions and of four equivalent volumes with different random 
realizations, and we find that the dark matter correlation length of this realization is a factor 
of 1.2 lower than that derived from the average correlation function of the other four realizations 
(3.79^~^Mpc vs. A.60h''^Mpc) , simply by chance. There is also a systematic effect of missing power 
on scales larger than 50/i~^Mpc. With PM simulations of 100/?^^^Mpc volumes, we find a further 
increase in the mean ro by a factor of 1.1, and a slightly shallower slope (7 = 1.92 over the range 
1 — 10/i~"'^Mpc, vs. 7 = 1.97 for the bOh'^^Mpc realization of the SPH simulation). Unfortunately, 
we do not know just how these differences in the dark matter ^(r) translate to the galaxy ^(r), 
but the statistical and systematic effects in our 50^~^Mpc volume are large enough that we do not 
presently regard the quantitative discrepancies between the predicted and observed (^0,7) for 
galaxies as significant. The statistical uncertainties can be reduced by using the halo occupation 
distribution derived from the SPH simulation to populate a larger volume N-body simulation, as 
discussed by Berlind et al. (2002). 

Berlind & Weinberg (2002) show that the key requirements for obtaining a power-law ,^(r) in a 
ACDM model are reducing the efficiency of galaxy formation in the most massive (group and cluster 
scale) halos and suppressing pair counts in the lowest mass halos by keeping fluctuations about the 
mean occupation number well below the Poisson level (see also Jing, Mo & Borner 1998; Seljak 
2000; Peacock & Smith 2000; Scoccimarro et al. 2001). The galaxy population in our simulation 
satisfies both of these requirements, a point that we discuss in detail elsewhere (Berlind et al. 2002). 
Our conclusion that including galaxy formation physics removes most of the discrepancy between 
the predicted correlation function of dark matter and the observed correlation function of galaxies 
agrees with conclusions derived from other studies based on semi-analytic modeling (Kauffman 
et al. 1999a; Benson et al. 2000a,b; Somerville et al. 2001), high resolution N-body simulations 
(Colin et al. 1999), and numerical hydrodynamics (Pearce et al. 1999, 2001; Cen &; Ostriker 2000; 
Yoshikawa et al. 2001). 



3.2. Correlation Functions of Sub-populations 

As mentioned in §3.1, the galaxy correlation function is known to vary significantly with 
galaxy luminosity, color, and morphology (e.g., Davis & Geller 1976; Hamilton 1988; Loveday et 
al. 1995; Guzzo et al. 1997; Willmer, da Costa &: Pellegrini 1998). The most comprehensive recent 
examinations are the studies of luminosity and color dependence for SDSS galaxies by Zehavi et 
al. (2002) and the studies of luminosity and spectral type dependence for 2dFGRS galaxies by 
Norberg et al. (2001, 2002b). Zehavi et al. (2002) find a steady increase of correlation amplitude 
with luminosity, going from 1.5 magnitudes fainter than to ~ 1.5 magnitudes brighter, and 
they find that red galaxies have a higher amplitude and steeper correlation function than blue 
galaxies, in agreement with earlier studies of the difference between clustering of early type (red) 



-19- 




log r 



Fig. 6. — Ratio of the z = correlation functions to a power-law ^{r) = (r /A.Oh^^Mpc)^^'"^^ , for 
the complete galaxy sample (solid), the L* sample (dotted), and the dark matter (dashed). Small 
horizontal offsets have been added to the jackknife error bars to improve clarity. On this plot, the 
SDSS correlation function would be a horizontal line at log(6.1/4.0)^-^^ = 0.32, and the 2dFGRS 
correlation function of L* galaxies would be a horizontal line at log (4. 9/4.0)^'^^ =0.15. 
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Fig. 7. — Dependence of the galaxy correlation function on stellar population age (top), baryon 
mass (middle), and density of the local environment (bottom), at z = 0. In each panel, the solid 
line shows the correlation function and jackknife error bars of the complete galaxy sample, and open 
circles show the power-law fit to the SDSS correlation function. Dashed lines show the correlation 
function of the older, more massive, or higher density half of the complete sample in panels (a)-(c), 
respectively, and dotted lines show the correlation function of the other half of the sample. 
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and late type (blue) samples. Norberg et al. (2001) find a somewhat different luminosity dependence 
for the 6j-selected 2dFGRS galaxies, with little dependence of clustering amplitude on luminosity 
below but a strong dependence at higher luminosities. Norberg ct al. (2002b) find that the 
correlation amplitude varies strongly with galaxy spectral type, analogous to the usual dependence 
on morphological type, and that the luminosity dependence is present within each spectral type 
class. Galaxies with early spectral types have a somewhat steeper correlation function than those 
with late spectral types, though the difference in slope is not as striking as that for the SDSS red 
and blue galaxy samples. 

Our simulation volume is too small and our computation of galax;y properties too simple to 
allow detailed comparisons to these observations, but we can examine qualitative trends. The color 
and spectral type of a galaxy depends largely on the age of its stellar population, and morphology 
is also known to correlate strongly with population age. For our simulated galaxy population, 
we define a galaxy's age by the epoch at which it formed half of its stars. While this approach 
necessarily brings our heuristic star formation algorithm into play, the details of this algorithm 
should have little cff'ect on the conclusions presented here, since we will use only the relative ages 
of the simulated galaxies. 

Figure 7a shows the correlation functions of the older (dashed line) and younger (dotted line) 
halves of the complete galaxy sample, compared to the correlation function of the complete sample 
(solid line). The correlation amplitude exhibits a clear dependence on galaxy age; the older galaxies 
have a correlation length of ro = 5.2/i~^Mpc and the younger galaxies have tq = 2.8/i^^Mpc. The 
fitted slopes of the correlation functions are nearly identical, 7 = 1.84 and 1.82, respectively. 
(Values of ro and 7 for all sub-populations appear in Table 2.) These results are in good qualitative 
agreement with the spectral type dependence found by 2dFGRS; relative to the SDSS results, they 
reproduce the observed color dependence of correlation amplitude but not the observed change of 
slope. 

Figure 7b compares the correlation functions of the more massive (dashed) and less massive 
(dotted) halves of the complete sample. While the more massive galaxies have a higher ^(r) at all 
separations, the difference is marked only below ~ 0.5/i~^Mpc. Power-law fits yield nearly identical 
correlation lengths, ro ~ 4.0/t~-'^Mpc, but significantly different slopes (7 = 1.90 vs. 7 = 1.57). 
Table 1 shows that ro does rise significantly if one takes the 500 most massive (ro = 4.5h~^Mpc) or 
200 most massive (ro = 5.2^~^Mpc) galaxies, and in combination with Figure 7b it suggests that 
this mass dependence sets in at roTighly the space density of galaxies. The 2dFGRS results show 
a transition of this sort (Norberg ct al. 2001), though the SDSS results show a steadier dependence 
of clustering amplitude on luminosity (Zehavi et al. 2002). 

Figure 7c shows the effect of dividing the galaxy sample based on local density, computed 
by smoothing with a spline kernel (the same form used in the SPH calculation) whose radius 
is adjusted to always enclose 16 galaxies within the smoothing volume. Since ^(r) is itself a 
measure of overdensity at radius r, it is no surprise that this division produces the most marked 
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change in the correlation function. The enhancement of ^(r) persists to scales much larger than 
the typical smoothing radius because of the bias of locally dense regions towards overdensc large 
scale environments (Kaiser 1984; see Mann, Peacock, & Heavens 1998 and Narayanan, Bcrlind, 
& Weinberg 2000 for discussions in the context of N-body models). The observed dependence of 
galaxy morphology on local density (e.g., Dressier 1980; Postman & Geller 1984) should therefore 
lead to a large scale bias in the clustering of early- type galaxies. 

To summarize the results of this and the previous section, we find that our simulated galaxy 
population reproduces many of the features of observed galaxy correlation functions: a power-law 
form with 7 f« 1.75 down to separations r ~ 0.02/t~^Mpc, an increase of correlation strength 
with mass that becomes more pronounced at high masses, and a higher correlation amplitude for 
galaxies with older stellar populations. The trends of clustering strength with galaxy mass and 
age agree with the predictions from semi-analytic models (Kauffman et al. 1999a; Benson et al. 
2000a; Somerville et al. 2001) and other hydrodynamic simulations (Cen & Ostriker 2000; Pearce 
et al. 2001; Yoshikawa et al. 2001). The predicted correlation lengths are somewhat low and the 
^(r) slope for galaxies slightly too steep in comparison to the SDSS results, but our N-body 
experiments suggest that these discrepancies are largely explained by the truncation of power at 
the scale of our 50/i^^Mpc box and by a lower than average clustering amplitude in the particular 
ACDM initial conditions used in our simulation. 



3.3. Pairwise Velocities 

Peebles (1980), Davis &; Peebles (1983), and Bean et al. (1983) describe the techniques now 
commonly used in correlation function analyses of galaxy redshift surveys. Starting from ^(rp,7r), 
the correlation as a function of projected and line-of-sight separations, one integrates over tt to 
obtain the projected correlation function w{rp), which can be easily inverted to obtain the real space 
3-d correlation function ^(r). The full ^(r^, vr) can be calculated, approximately, by a convolution of 
^(r) with the galaxy pairwise velocity distribution (see Fisher [1995] for a careful discussion of this 
approximation and its relation to the linear theory approximation of Kaiser [1987] and Hamilton 
[1992]). This distribution is specified by its first and second moments vuir) and cr^2('"); by a 
functional form, usually assumed (on both theoretical and observational grounds) to be exponential. 
The pairwise dispersion can be inferred from ^{vp, tt) in a fairly robust fashion, though it is sensitive 
to the presence of rare, rich clusters in the survey volume (Mo, Jing, &: Borncr 1993). The mean 
pairwise velocity can in principle be deduced as well, given sufficiently good data (sec, for example, 
the analysis of Fisher et al. 1994); at larger separations, it can also be measured directly in galaxy 
peculiar velocity surveys (Ferreira et al. 1999). 

FigTirc 8 shows the mean pairwise velocity (top) and pairwise velocity dispersion (bottom) at 
z = (left) and z = 1 (right). At z = 1, the mean pairwise velocity of the galaxies is higher 
than that of the dark matter, at all separations 0.1 — 10/t~^Mpc. Although galaxies and dark 
matter feel the same gravitational forces and therefore have similar large scale velocity fields. 
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Fig. 8. — Mean pairwise radial velocities (top panels) and pairwise radial velocity dispersions 

(bottom panels) at 2: = and 2: = 1, as indicated. Solid lines with jackknife error bars show results 
for the complete galaxy sample, while dotted and dashed lines show results for the L^, sample and 
the dark matter, respectively. In the 2 = panels, circles and squares show results for the older 
and younger halves of the complete sample, respectively. The pairwise velocity is defined with 
Hubble flow removed and a convention of positive sign for galaxies moving towards each other. 
The pairwise dispersion is 1-dimensional, along the direction of separation, with the mean pairwise 
velocity subtracted. 
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pair- weighted statistics like vi2{r) and ai2{r) are also sensitive to spatial bias; in this case, the 
preferential location of galaxies in overdense regions with high inflow velocities boosts Vi2{r). The 
mean pairwise velocities of the sample are higher than those of the complete sample, but the 
difference is of marginal statistical significance. 

By z = (top left), the bias of the galaxy mean pairwise velocities has largely disappeared, 
in correspondence with the small spatial bias at z = (see Figure 4, and Figure 9 below). The 
complete and galaxy samples again exhibit similar vi2 (r) , with velocities of the sample being 
slightly higher. There is a much sharper difference, however, between the older and younger halves 
of the complete sample (circles and squares in Figure 8), with the older galaxies having nearly twice 
the value of vi2{r) on scales of one to several /i~^Mpc. This difference reflects the preference of 
older galaxies for denser environments with larger inflow velocities. 

The pairwise dispersion of the complete galaxy sample is below that of the dark matter both 
at z = 1 and at z = 0. The sample's pairwise dispersion is similar to that of the dark matter at 
z = 1 and below it at z = 0. All of the galaxy samples show a peak in ai2{r) at r ~ 0.3/i~^Mpc, 
roughly the half-mass radius of the richest groups. While one might expect suppression of the galaxy 
pairwise dispersion to go hand-in-hand with spatial anti-bias, our results show that the story is not 
so simple. For example, the correlation function of the sample at z = is positively biased at all 
separations (Figure 6), but its pairwise dispersion is lower than that of the dark matter and nearly 
identical to that of the (more weakly clustered) complete galaxy sample. As emphasized by Berlind 
&; Weinberg (2002) and Sheth et al. (2001), the pairwise dispersion is influenced by factors that do 
not aflFect ^(r) (or even affect it with opposite sign), making it sensitive to details of the relation 
between galaxies and their surrounding dark matter halos. While auir) is not sensitive to galaxy 
baryon mass (at least in the comparison presented here), it is highly sensitive to stellar population 
age: the older half of the complete sample has a substantially higher pairwise dispersion than the 
younger half because of the location of older galaxies in more massive virialized structures. 

The pairwise dispersion of the complete and galaxy samples agrees reasonably well with the 
values (Ti2 ~ 550-650 kms"^ at r ~ l/i ^Mpc found in the CfA2, LCRS, and SDSS rcdshift surveys 
(Marzke et al. 1995; Jing, Mo Sz Borner 1998; Zehavi et al. 2002). However, the sensitivity of C7i2(r) 
to rare, rich clusters is reflected in the large jackknife error bars of Figure 8, so we cannot take this 
agreement as a major triumph of the simulation. The scale-dependence in Figure 8 contrasts with 
the rather flat au (r) inferred from the observations, but for the same reason we do not view this as 
a serious discrepancy. We can conclude from Figure 8 that the predicted difference between dark 
matter and galaxy pairwise dispersions makes it easier to understand the relatively low observed 
value of ai2, and that the predicted difference between old and young galaxies naturally accounts 
for the factor of ~ 2 difference in ai2 derived for red and blue galaxies in the SDSS (Zehavi et al. 
2002) and the factor ~ 1.7 difference in au derived from optical redshift surveys (as cited above) 
and from the IRAS PSCz catalog (as analyzed by Jing, Borner, & Suto 2002). 

Our conclusion that the galaxy pairwise dispersion at z = is significantly lower than that 
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of the dark matter agrees with the results of Pearce et al. (2001), based on a hydrodynamic sim- 
ulation, and Benson et al. (2000a), based on populating N-body halos according to semi-analytic 
prescriptions. We also agree fairly well with these calculations in terms of the amplitude of the 
pairwise dispersion, ai2 ^ 500 — 600 kms~^ at r l/i~^Mpc, and the shape of (Ti2(r), though 
our pairwise dispersion drops somewhat more rapidly at r < 0.5^~^Mpc. Kauffman et al. (1999a), 
also using N-body halos and semi-analytic galaxy models, find only a slight difference between 
the dark matter and galaxy pairwise dispersions, and their predicted amplitude at r ^ l/i~^Mpc 
IS ai2 ~ 800 kms"^ Benson et al. (2000a) argue that the difference between their results and 
Kauffmann et al.'s arises because their (Benson et al.'s) semi-analytic model places fewer galaxies 
in high dispersion halos. Since the halo occupation of our simulation agrees well with that of the 
Benson et al. (2000a) models (Berlind et al. 2002), we think that the same explanation applies to 
our results. We agree with Kauffman et al. (1999a) with respect to the mean pairwise velocity 
(not examined by the other groups), with both calculations showing little difference between the 
galaxies and the dark matter at z = 0. While age divisions (or color divisions) were not examined 
in these other studies, the tendency to find older, redder galaxies in more massive halos appears 
to be a generic prediction of both hydrodynamic and semi-analytic calculations, so we expect the 
strong age-dependence of ai2{r) and vi2{r) to be a fairly generic result. 



Moments of the galaxy count distribution in spherical (or other) cells provide an alternative to 
correlation functions for characterizing galaxy clustering. The n-th central moment of the galaxy 
counts is closely related to the n-point correlation function averaged over the cell volume. As pointed 
out by Szapudi & Szalay (1993), for a discrete galaxy distribution it is convenient to work with 
factorial moments such as {N(N — 1)) and {N(N — 1){N — 2)) (where is the number of galaxies 
within a cell or sphere) , since these automatically remove Poisson shot noise contributions and can 
therefore be related to the moments of an underlying continuum distribution in a straightforward 
way. 

The ratio ba of rms galaxy count fluctuations to rms mass fluctuations provides a simple and 
robust measure of galaxy bias, and the value of for spheres of radius 8h~^Mpc is the number 
most often quoted as the "bias factor" of the galaxy distribution. Figure 9 shows baiSh'^Mpc) and 
ba{2h~^Mpc) as a function of redshift, with 



and the moments evaluated using 10 spheres placed randomly in the simulation volume. At 
z = the bias factors are similar for r = 2^~^Mpc and 8h~^Mpc, with 6o-(8/i~^Mpc) = 1.02 for 
the complete galaxy sample, 1.22 for the sample, and 1.40 for the 200 most massive galaxies. 
Figures 5-8 show that the simulated galaxies do not trace the mass in detail and that galaxy 



3.4. Moments of GalELxy Counts 
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Fig. 9. — Evolution of the bias factor ba, defined by the ratio of rms galaxy count fluctuations 
(corrected for shot noise) to rms dark matter fluctuations in spheres of comoving radius Shr^Mpc 
(heavy lines) or 2/i^^Mpc (light lines). Solid, dotted, and dashed lines show results for the complete 
galaxy sample, the 500 most massive galaxies, and the 200 most massive galaxies, respectively. 
Lines for the 500 largest galaxies stop at z = 2.5, since the complete sample contains fewer than 
500 galaxies at higher redshift; likewise, lines for the 200 largest galaxies stop at z = 3.5. Circles 
show the number-conserving model of Pry (1996), normalized to a bias factor of 3.1 at redshift 
z = 2.5. 
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Fig. 10. — The hierarchical amplitude S^i, defined as the ratio of the third factorial moment to the 
square of the second factorial moment in spheres of radius r, at z = 0. Points connected by the 
solid line show results for the complete galaxy sample, with jackknife error bars. Other lines show 
results for the dark matter (short-dashed), the L*-sample (dotted), and the older (long-dashed) 
and younger (dot-dashed) halves of the complete sample. The value of Sj, inferred from angular 
clustering in the APM galaxy catalog is roughly constant at S-i « 3.5 over this range of scales. 



-28- 



clustering depends on age and luminosity, but our complete galaxy sample is, in an rms sense, 
nearly unbiased at z = 0. 

As shown earlier in terms of ^(r), the galaxy bias starts high and declines with redshift as the 
dark matter clustering catches up. For the complete sample, ba-{8h~^Mpc) = 3.3 at z = 3. At 
redshifts z ~ 1, the bias factor is significantly larger at 2/i^^Mpc than at Sh^^Mpc, presumably 
reflecting the different behavior of ba in the non-linear and linear regimes. Fry (1996) discusses 
an analytic model in which the galaxy population is born at some redshift with a bias factor 
6* but thereafter moves with the same velocity as the dark matter and conserves number (see also 
Tegmark & Peebles 1998). In this case the bias factor at redshift z < z* is given by b{z) — 1 = 
(6* — l)D{z>f)/D{z), where D{z) is the linear growth factor at redshift z. Open circles in Figure 9 
show the prediction of this model, normalized to the bias factor 6* = 3.1 of the 500-galaxy sample 
at = 2.5. The actual bias factor of this sample declines significantly faster than the number- 
conserving model predicts, even though the sample contains 500 galaxies at all redshifts, presumably 
because mergers drive down the number of pairs in dense regions and galaxies that form at z < 2.5 
are born in less biased environments. Somerville et al. (2001) obtain a similar result with semi- 
analytic modeling. 

Discussions of the third moment of the count distribution often focus on the hierarchical ratio 

S3 = For spherical cells and a linear power spectrum P{k) cc /c", the dark matter has 

5*3 = 34/7— (3 + n) to second order in perturbation theory, independent of sphere radius and power 
spectrum amplitude and nearly independent of Qrn (Juszkiewicz, Bouchet, &: Colombi 1993). If 
the galaxy density contrast is a local function of the matter density contrast, 6g = f{S), then the 
relation between and Ss (representing the galaxy and mass moments, respectively) depends on 
both the linear bias factor 6^ and a second bias factor 62 that characterizes the second derivative 
f"{6) at 6 = (Fry & Gaztahaga 1993; Juszkiewicz et al. 1995). The behavior of and the relation 
between 6*3^ and 5*3 becomes more complicated in the fully non-linear regime where second-order 
perturbation theory breaks down, but it remains the case that S3 is only weakly dependent on scale 
and that the ratio Ssg/Ss depends on the non-linearity of the relation between galaxy and mass 
densities. 

Figure 10 shows the hierarchical ratio S^f defined in terms of factorial moments. 



at z = 0. For the complete galaxy sample, S^f ~ 3.5 — 4 for r = 1 — 4/i~ Mpc, rising to S^f = 6.5 
at r = 8/i~^Mpc. These values are slightly below those of the dark matter distribution. Ratios for 
the more massive, L^, sample are lower still; these galaxies are more clustered in an rms sense, but 
they are less "skewed" in the sense measured by ^af. As emphasized by Colombi et al. (2000), the 
statistical and systematic errors in measurements of S3 from a volume this size are substantial, and 
our jackknife error bars are large. Thus, while the predicted value of S^f for the complete galaxy 
sample agrees well with the value S3 3.5 that Gaztanaga Sz Frieman (1994) infer by de-projecting 
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the APM angular count moments, we cannot claim this as a major success. (The recent analyses 
of angular clustering in the SDSS by Gaztahaga [2002] and Szapudi et al. [2002] suggest that the 
APM numbers should be adjusted upward by 20%.) However, the relative values of S^f for the 
different populations should be more robustly predicted, and our simulation indicates that galaxies 
comparable to those in the complete sample should have a hierarchical ratio slightly below that of 
the dark matter, and that Ssf should be smaller for more massive galaxies. The most striking result 
in Figure 10 is the strong scale-dependence of Sst for young galaxies, a prediction that should be 
testable with the SDSS and 2dFGRS surveys in the near future. 

4. Galaxy-Mass Correlations 

Advances in wide-field CCD imaging have opened a new window on the relation between 
galaxies and mass: weak lensing measurements of the galaxy-mass cross-correlation function. This 
statistic can be derived from galaxy-galaxy lensing analyses, which measure the mean shear profile 
around a sample of foreground galaxies using the tangential distortion of background galaxies 
(Tyson et al. 1984; Brainerd, Blandford, & Small 1996; deh'Antonio & Tyson 1996; Griffiths et 
al. 1996; Hudson et al. 1998; Fischer et al. 2000; Hoekstra, Yee, 8z Gladders 2001; McKay et al. 
2001; Smith et al. 2001; Wilson et al. 2001b). Alternatively, one can correlate the mass map 
derived from the cllipticity correlation function of faint galaxies with the light map of brighter, 
foreground galaxies in the same images (Wilson, Kaiser, &; Luppino 2001a). In addition, cosmic 
shear measurements are beginning to yield measurements of the matter correlation function itself 
(for specified ^m), with impressively good agreement among independent surveys (see Mellier et 
al. 2001; Hoekstra, Yee, & Gladders 2002; and references therein), allowing for the comparison of 
galaxy and mass fluctuation amplitudes. 

Figure 11 shows the galaxy-mass cross correlation function for the complete galaxy sample and 
for the subsamples divided by age (top), mass (middle), and local density (bottom). We define 
^gm('^) = ('5g(x)5m(x-|-r)), where r = [r| and 6g = ng/fig — l and 5 = p/p—1 are the galaxy and mass 
(dark -|- baryonic) overdensities; we compute it by counting galaxy-particle pairs, with particles 
weighted by mass. While the form of ^gm('') at small r is connected to the typical galaxy halo 
profile, the connection is rather loose because the galaxies are not isolated and the samples contain 
objects with a range of halo virial radii (points emphasized by Guzik &: Scljak (2001), White, 
Hernquist, & Springel (2001), and Yang et al. (2002)). We use the same format in Figure 11 and 
Figure 7, and the two figures bear a striking resemblance to each other. To a first approximation, 
the galaxy-mass correlation function has the same shape as the galaxy-galaxy correlation function, 
similar amplitude, and similar dependence on galaxy age, baryon mass, and environment. However, 
the variations of Cgm(?') for different subsamples are smaller than the variations of ^gg{r), as one 
would expect, given that the mass distribution stays the same regardless of the galaxy subsample. 

With measurements of Cgg(r), ^gm('"), and the mass correlation function ^mm('"), one can define 
two "bias functions," b^{r) = [^gg(r)/^inm('")]''"'^^ and b^xi^) = ^gg(?')/^gm(^)- The ratio rx(r) = 
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Fig. 11. — The cross-correlation function ^gmif) of galaxies and mass (dark matter + baryons) 
at z = 0. In each panel, the solid line with jackknife error bars shows results for the complete 
galaxy sample. Dashed lines show results for the older, more massive, or higher density half of 
the complete sample in panels (a)-(c), respectively, and dotted lines show results for the other 
half of the sample. Circles are the same as those in Fig. 7, to facilitate visual comparison to the 
autocorrelation functions. 
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Fig. 12. — Scale-dependence of the bias factors defined by b^{r) = [■^gg(r)/^mm(?^)]^^^ (top) and by 
^^x(^^) = 'Cgg(^)/Cgm(?') (middle), and the "correlation coefficient" rx(r) = h^{r) /h^y^{r) (bottom), 
where Cgg(r) and ^mm(^) are the galaxy and mass autocorrelation functions and Cgm('') is the 
galaxy-mass cross-correlation function. Results are shown for the complete galaxy sample (solid), 
the L^<-sample (dotted), and the older (short-dashed) and younger (long-dashed) halves of the 
complete sample, at redshifts z = (left column) and z = 0.5 (right column). 
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b^{r)/b^xif) = ^gm(?')/[?gg('^)^mm('")]^^^ Corresponds to a "correlation coefficient" as defined by, 
e.g., Tegmark & Bromley (1999), though here we work in terms of correlation functions rather than 
power spectra. In the case of linear, deterministic bias, where Sg{x.) = bS{x.) at all positions x, 
and b^x would be equal to 5 and independent of scale, and rx would equal unity. Figure 12 plots 
b^{r), b^x{r)i and rx(r) for the complete galaxy sample, the sample, and the older and younger 
halves of the complete sample. We will focus on the z = results, but because measurements 
of these quantities come from deep imaging surveys, we also show the simulation predictions at 
z = 0.5, which are qualitatively similar. 

The complete sample has 6^ « 1, but there is a broad minimum around 0.5/i~^Mpc, with 
6^(0.5/i~^Mpc) « 0.756^(10/i~^Mpc). It is precisely this mild scale-dependence of b^{r) that trans- 
forms the curved dark matter correlation function into a power-law galaxy correlation function 
(see Figure 6). The sample has higher 5^, roughly flat (though noisy) at r > 0.5/i~^Mpc, and 
climbing towards small r. The old galaxies have b^{r) similar to that of the sample, while the 
young galaxies have low 6^(r) with a scale-dependence similar to that of the complete sample. 

The scale-dependence of (r) is generally weaker than that of b^{r). At r > l/i~^Mpc, values 
of b^x are similar to those of b^ for the corresponding sample, except in the case of the young galaxies, 
which have 6^x > b^ and thus rx(r) < 1. This "anti-correlation" arises because the young galaxies 
tend to avoid the higher mass halos, which contribute significantly to ^mm('')- The complete, L^,, and 
old galaxy samples have rx (r) ~ 1 at large scales and Vx (r) > 1 at small scales. While a correlation 
coefficient defined in terms of smoothed density fields, e.g. {Sg{x.)Smix))/[{Sg{x)){5^{x))]^^'^ , must 
lie in the range [—1, 1], the same is not true for rx(r), which is defined by ratios of correlation 
functions. The high values of rx(r) at small r reflect the location of a significant fraction of 
galaxies (especially in the sample) near the central density maxima of dark matter halos. Galaxy- 
mass and galaxy-galaxy pairs involving these central galaxies boost ^gmir) and ^gg(r), respectively, 
relative to the values expected if galaxies traced mass within halos. There is no corresponding 
boost to ^mm('"); and since rx(r) oc ^gm(?')/[^gg(?')]^^^5 it tends to rise above unity on scales where 
central galaxies make major contributions to the correlation functions (for related discussions, see 
Seljak 2000 and Berlind & Weinberg 2002). 

Hoekstra, Yee, & Gladders (2001) measure a ratio of galaxy-mass and galaxy-galaxy fluctu- 
ations that is directly analogous to 6^x('')) though it is defined in compensated apertures rather 
than annuli r ^ r + Ar. In our terminology, their results imply b^x = 1-261q];2 

= 0.4 cosmology (converted from their b/rx = 1-05 for $7^ = 0.3 using their equation 14), 
with no detectable scale-dependence in the range r ^ 0.1 — 1.5/i~^Mpc. This result is interme- 
diate between that of our complete and galaxy samples at z = 0.5. (The median redshift of 
the Hoekstra et al. lens sample is z ~ 0.35, where the bias should be intermediate between our 
z = and z = 0.5 values.) While we have not modeled the Hoekstra et al. procedure in detail, our 
prediction of a mild positive bias with no substantial scale dependence and a bias factor similar to 
that applying on larger scales appears to be fully compatible with their results. The approximate 
scale-independence of b^x{^) also agrees with Wilson et al.'s (2001a) finding of similar shapes for 
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the galaxy-galaxy and galaxy-mass angular correlation functions. It is difficult for us to compare 
the amplitude of to Wilson et al.'s result, in part because they weight galaxies by flux, and in 
part because they interpret their measurement in terms of a value of 0,^ assuming that their (red, 
luminous) lens galaxy population traces the mass. The existence of positively and negatively biased 
samples with approximately scale-independent b^x{r) in Figure 11 demonstrates that a constant 
ratio Cgg (^)/Cgm (^) does not, on its own, justify this assumption. 

One can also use galaxy-galaxy lensing to infer the average mass within a sphere of radius 
r centered on a sample galaxy, an approach emphasized by McKay et al. (2001). (Note that this 
measurement makes no attempt to distinguish mass that "belongs" to the central galaxy from mass 
that "belongs" to its neighbors or to the common halo of its group or cluster; all mass counts, and 
it may be counted multiple times for different galaxies.) Figure 13a shows the average mass profile 
from 25/i~^kpc to l/i~^Mpc in five different bins of galaxy baryon mass, each a factor of two in 
width. We express the mass in terms of an equivalent circular velocity Vc = GAM{r)/r, which 
would be constant in the case of a singular isothermal sphere, though of course few galaxies would 
have any tracers in circular rotation at these large radii. We use the excess mass AM, subtracting 
the mean value Airr^p/S, because it is the excess mass that is constrained by lensing, but the 
difference is negligible on the scale of these plots. 

For the lowest mass galaxies, Mb = (0.5-1) x IO^^A^Tq, the mean circular velocity curve is fairly 
flat at Vc ~ 200 kms^^ out to r ~ 100/i^^kpc, but it rises at larger radii because these galaxies 
typically reside in groups whose characteristic circular velocities are larger than those of the galaxies 
themselves. Conversely, the highest mass galaxies have a mean circular velocity curve that falls 
all the way out to r = l/i~^Mpc, because these massive galaxies typically reside at the centers 
of potential wells whose mass profiles (baryon -|- dark) are steeper than isothermal. Intermediate 
mass ranges have intermediate behavior, with falling Vc profiles at small r changing to fiat or rising 
profiles at r ~ 100 — 200/i^^kpc. Dashed and dotted curves show Vc{r) for the older and younger 
halves, respectively, of the (1 — 2) x 10^^ sample. At small radii (r < 50/i~^kpc), the circular 
velocity is the same for the two subsamples, but the preferential location of older galaxies in more 
massive groups manifests itself as a higher Vc at larger radii. 

Figure 13b plots the ratio of the aperture mass AM(r) to galaxy baryon mass Mj, (which is 
approximately equal to the stellar mass), as a function of Mf,. From bottom to top, solid points 
connected by solid lines show results for r = 50, 100, 260, 500, and 1000/i~^kpc. We use 260^~^kpc 

rather than 250/;.~^kpc to facilitate comparison to McKay et al.'s (2001) analysis of galaxy-galaxy 
lensing in the SDSS. We average in the order appropriate for such a study, computing the mean 
value of the excess mass in bins of M^, dividing by the mean Mf, of the bin, and plotting the result 
for each bin at the value of the mean M^. Note that this ratio counts all excess mass within radius r, 
including that associated with neighboring galaxies, but divides by the baryon mass of the central 
galaxy only. For r = 50/i~^kpc the AM{r)/Mh curve is essentially flat — on average, a galaxy's 
baryon mass is proportional to the total mass in a 50/i~^kpc sphere around it. For larger r, the ratio 
rises towards smaller M^, which makes sense if the apertures around these galaxies start to include 
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Fig. 13. — The mass distribution around galaxies at z = 0. (a) Solid curves show the circular 
velocity Vc{r) = ^ G^M{r)/r as a function of radius where AM(r) is the mean total excess mass 
in spheres of radius r centered on galaxies. From bottom to top, the curves show results for galaxies 
with baryon mass (stars plus cold gas) in the range 0.5 — 1, 1 — 2, 2 — 4, 4 — 8, and 8 — 16, in units 
of IO^^Mq. Dotted and dashed curves show results for, respectively, the older and younger halves 
of the galaxies in the 1 — 2 x lO^^M© mass range. (6) Solid curves show the ratio of the mean total 
excess mass within radius r to the baryon mass of the central galaxy as a function of the galaxy 
baryon mass, for radh r = 50, 100, 260, 500, and lOOO/i'^kpc (bottom to top). Dotted and dashed 
curves show results at r = 260h~-'^kpc for galaxies that are older and younger, respectively, than 
the median age of galaxies in the mass bin. 
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Fig. 14. — Comparison of the simulation results for AM(260/i^^kpc) /Mb to observational estimates 
from the SDSS. The solid curve (same as the central solid curve in Fig. 13b) shows the prediction 
from our (50/i~^Mpc)^ simulation. The dashed curve shows this prediction with baryon masses 
rescaled to account for numerical resolution effects, based on our comparison of higher resolution 
simulations of smaller volumes. It represents our best guess at the result we would obtain from a 
simulation using 2 x 288^ particles instead of 2 x 144^. Points show the weak lensing results of McKay 
et al. (2001), converted from AM(260/i"^kpc)/L to AM(260/i"^kpc)/M5 using the estimates of 
Mste\/L as a function of L from Kauffman et al. (2002). Estimates from g, r, i, and z band are 
represented by triangles, squares, pentagons, and circles, respectively. We plot error bars on the 
z-band points only, but they are similar for other bands. 
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more mass that "belongs" to their richer neighbors. Dotted and dashed hnes are computed for the 
older and younger halves of the galaxies in each bin, at r = 260/i~^kpc. For M5 < 5 x 10"^"^ Mq, 
the older galaxies have substantially higher AM{260h^^kpc), by a factor 1.5 — 2, analogous to 
the factor ~ 2.7 difference that McKay et al. (2001) find for elliptical and spiral subsamples. At 
higher masses, there is little difference in age between the older and younger halves of the sample 
(essentially all massive galaxies are old), and the systematic difference in AM(260^~^kpc) narrows. 

Figure 14 compares our predictions to the results of McKay et al. (2001), who estimate 
AM(260/i~^kpc) by fitting isothermal spheres to their measured shear profiles in four wide bins of 
galaxy luminosity. They find that AM(260/t~^kpc)/L is roughly independent of galaxy luminos- 
ity L in the Sloan g, r, i, and z bands. To convert AM{260h-^kpc)/ L to AM {260h-^kpc)/ Mi, 
for comparison with our predictions, we draw on the results of Kauffman et al. (2002), who es- 
timate ratios of stellar mass to observed luminosity by fitting population synthesis and dust ex- 
tinction models to SDSS galaxy spectra and colors. The trend of mean stellar mass-to-light ratio 
with galaxy luminosity is well described by the relations {Mgtei/L) = T*(L/L*)^, with (T*,/3) = 
(2.65,0.375) in g, (2.29,0.375) in r, (2.06,0.35) in i, and (1.57,0.275) in z, where = 1.11, 1.51, 
2.05, and 2.58 x lO^'^/i^^L© in the four bands, respectively (Blanton et al. 2001, for = 0.3, 

= 0.7).^ These relations fit the Kauffman et al. (2002) results to within ~ 10% for galaxies 
within two magnitudes of L^,, though they break down outside of this range. We take the values 
of AM(260^~^kpc)/L from McKay et al.'s Table 2, convert to ^ = 0.65, and divide by the value 
of (Mgiei/L) at the central luminosity of each bin (which McKay et al. compute using the same 
galaxy weights that enter the lensing analysis). In Figure 13b, triangles, squares, hexagons, and 
circles show results derived from the g, r, i, and z band mass-to-light ratios, respectively. Results 
in the four bands arc not independent because they rely on the same lensing data, but the generally 
good agreement among them suggests that there are no major systematic errors in the population 
synthesis and dust extinction modeling. The data points come in four groups reflecting the four 
luminosity bins, though the g band points do not reach the highest Mf, values because of the higher 
Mf.te\/L in g band. The solid line in Figure 14 shows the simulation result for all galaxies (same as 
the middle solid line of Fig. 13b). 

The predicted and observationally inferred values of AM(260/i~^kpc)/M;, agree well in the 
lowest mass bin, but at higher masses the observational points lie above the simulation predictions, 
by a factor ~ 1.5 — 2 on average. We have treated our galaxies as entirely stellar, and we have ignored 
the light "missed" by SDSS Petrosian luminosities (see Strauss et al. 2002), but accounting for these 
effects would reduce the discrepancy by only ~ 10 — 20%. We conclude that the baryon masses 
of our simulated galaxies are on average too high by a factor ~ 1.5 — 2, for > 2 x 10^^ M0, 



■^Kauffmann ct al.'s Figure 14 plots median Mstc\/L values and interquartile ranges in bins of absolute magnitude. 
Since it is the mean Mstei/L that matters for our purposes, we have computed these means using the tabulated 
Mstei/L distributions kindly provided by G. Kauffmann, excluding the 10% tails of the distributions so that the mean 
is not distorted by extreme Mt,tci/L values that may be a result of poor model fits. Our conclusions would not be 
substantially different if we instead used the median M^tei/L values plotted by Kauffman et al. (2002). 
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provided that the Kroupa (2002) stellar IMF assumed in Kauffmann et al.'s (2002) population 

synthesis modeling is indeed universal across galaxy types and evolutionary stages. Other hints of 
a similar discrepancy include the high overall fraction of stellar mass in the simulation (Dave et 
al. 2001) and the relatively high stellar mass-to-light ratios required to match the observed galaxy 
luminosity function (Katz et al., in preparation). 

We believe that these excessive galaxy baryon masses are largely numerical in origin. A 
comparison of two simulations of the same (22.22/i~^Mpc)^ volume, one with the same resolution as 
the (50/i~^Mpc)^ simulation analyzed here and one with a mass resolution eight times higher, shows 
that the same galaxies are on average more massive in the lower resolution simulation, by ~ 10% at 
the 5 X IQi^^Mq completeness limit rising to a factor ~ 2 at masses of several x lO^^M© (Fardal et 
al., in preparation; one can see hints of this effect in the mass function comparison of Figure 2). The 
higher baryon masses at low resolution probably arise from the overestimated cooling rate of hot gas 
particles in the vicinity of cold, dense clumps (Pearce et al. 1999; Croft et al. 2001). An alternative 
formulation of SPH that performs better in the presence of contact discontinuities between cold 
and hot phases indeed leads to lower galaxy baryon masses, typically by a factor ~ 2 (Springel & 
Hernquist 2002). Empirically, we find that the relation Mhr = 5 x lO^°M0(Mii./5.4 x lO^°M0)°-^2 
provides a fairly accurate scaling between masses in the "low resolution" and "high resolution" 
simulations. The dashed curve in Figure 14 shows the effect of applying this empirical scaling. It 
represents our best guess at the results we would obtain from a 2 x 288^ particle simulation of the 
same (50/t~-'^Mpc)^ volume. Comparison to still higher resolution simulations at z = 3 suggests 
that this is sufficient to yield numerically converged baryon masses, but we have not yet been able 
to carry out such a test at 2; = 0. 

Clearly, there are still systematic uncertainties in the observational results (the weak lensing 

measurements themselves and the Mstd/L scaling) and in the numerical convergence of the simu- 
lation predictions. However, the resolution-corrected predictions and the observationally inferred 
mass ratios agree well at logM^/M© ~ 10.7 and 11.7, with a factor ~ 2 gap at the intermediate 
mass scale logMfe/M© ~ 11.4. The sharp drop between the second and third groups of observational 
data points is somewhat surprising, at least given the smooth predicted trend, so a full assessment 
of the comparison should await improved observational estimates, which should be available soon. 
Nonetheless, the current level of overall agreement does not leave much room for additional as- 
trophysical processes, such as AGN feedback or stronger stellar feedback, to substantially reduce 
the baryon masses of galaxies with > 5 x IO^^Mq. We would probably predict lower values 
of AM(260/i~^kpc)/M6 (and thus require some suppression of baryon mass) if we adopted lower 
Cljn or higher without changing the assumed as or P{k) shape. However, for our adopted 
cosmological parameters, the lensing comparison suggests that this simulation already incorporates 
the astrophysics needed to understand the baryon masses of luminous galaxies, within the current 
numerical and observational uncertainties. The constraints from galaxy-galaxy lensing complement 
those from the Tully-Fisher relation and the luminosity function, which we will discuss elsewhere 
(Katz et al., in preparation). 
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5. Summary and Discussion 

For the full population of galaxies with Mh > 5 x 10^^ Mq, our simulation predicts a bias factor 
ba{8h~^Mpc) of unity at z = 0. Beneath this rms similarity of fluctuation amplitudes, however, 
lie numerous differences between the galaxy and dark matter distributions, in terms of present-day 
structure and the evolution of that structure. 

1. The bias defined by rms fluctuations or by correlation functions is strongly redshift depen- 
dent. If we choose a galaxy population of fixed comoving space density (above a redshift-dependent 

baryon mass threshold), then the galaxy correlation function stays approximately constant from 
z = 3 to z = 0, so that bias is initially large and declines towards unity as the dark matter catches 
up to the galaxies (see Figure 4). 

2. The bias is scale-dependent in the non-linear regime, and this scale dependence itself 
depends on redshift. At z = 0, the galaxy correlation function is depressed below the dark matter 
correlation function at r ~ 0.5/t~^Mpc and boosted above it at r < 0.1/i~^Mpc. As a consequence, 
the galaxy correlation function is well described by a power law (of slope 7 = 1.78 ± 0.05), while 
the dark matter correlation function is not (see Figure 6). The dark matter correlation function 
is shallower at high redshift while the galaxy correlation function slope is roughly constant, so the 
rms bias factors at 2/i~^Mpc and 8/i~^Mpc start to diverge at z > 1 (see Figure 9). 

3. The correlation amplitude depends on galaxy baryon mass, with only a slight difference 
between the more and less massive halves of the complete sample (Figure 7b) but significant in- 
creases of tq and result if we select the 500 or 200 most massive galaxies in the simulation volume 
(Figures 5 and 9, Table 2). 

4. The dependence of the correlation amplitude on stellar population age is more striking than 
the dependence on baryon mass. The correlation functions of the older and younger halves of the 
complete sample have almost identical slopes, but their correlation lengths differ by nearly a factor 
of two (Figure 7, Table 2). 

5. At 2; = 0, the pairwise velocity dispersion o"i2(r) of galaxies is lower than that of the dark 
matter, by ^ 20% (Figure 8). There is a large difference between the pairwise dispersions of old and 
young galaxies, with the preferential location of old galaxies in dense environments making (Ti2{r) 
about twice as high at r < 2/i~^Mpc. The mean pairwise velocity 1^12 (?") is similar for galaxies and 
dark matter at z = 0, but the denser environments of older galaxies again cause a higher Vi2{r). At 
z = 1, the positive spatial bias of the galaxy population makes the mean pairwise velocity higher 
than that of the dark matter, though the pairwise dispersion remains somewhat lower. 

6. The skewness of galaxy count fluctuations, as quantified by the hierarchical ratio 53f, is 
lower than that of the dark matter by ~ 20% (Figure 10). The Ssi ratio and its scale dependence 
vary with galaxy baryon mass and age, in a fairly complex way. 

7. The galaxy-mass cross-correlation function depends on galaxy mass and age in much the 
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same way that the galaxy-galaxy correlation function itself does, though the trends are weaker 
because the mass distribution is the same in all cross-correlations (Figure 11). For each class of 
galaxies, the bias function defined by the ratio Cgg(r)/^gm(?') is roughly scale- independent, even 
when it is significantly different from unity, though it can be somewhat different in amplitude and 
shape from the analogous bias function [Cgg(r)/^mm('')]^''^ (Figure 12). 

8. The average extended mass distributions around galaxies depend significantly on baryon 
mass and age (Figure 13). The distribution around a typical high mass galaxy falls more steeply 
than an isothermal distribution, so that circular velocity curves [GAM(r)/r]^/^ decline from r = 
25/i~^kpc out to r = lh~^Mpc. For low mass galaxies, on the other hand, the average mass 
profile is shallower than isothermal at r > 100/i~^kpc, where the individual galaxy halos begin to 
encounter the environment of a typical surrounding group. This effect is much more pronounced 
for old galaxies than for young galaxies. The ratio of mean aperture mass AM(r) to galaxy baryon 
mass Mb is roughly independent of baryon mass for apertures ranging from r = 50^~^kpc to 
r = lh~^Mpc, though the ratio is higher for the galaxies with the lowest M^. The ratio is higher 
for old galaxies than for young galaxies, again demonstrating the preferential location of old galaxies 
in dense environments. 

For the most part, the dark matter clustering of the ACDM model and these biases of the 
galaxy population lead to good qualitative and, where we have adequate statistics for comparison, 
quantitative agreement with observed galaxy clustering. The predicted galaxy correlation func- 
tion is a power law of the observed slope and approximately the observed amplitude (see further 
discussion below). The redshift-independence of the comoving correlation length agrees with mea- 
surements from Lyman-break galaxy surveys at z f« 3 (Adelberger et al. 1998, 2002) and from deep 
redshift surveys extending to ^; ~ 1 (Postman et al. 1998; Carlberg et al. 1999). The predicted 
dependence of clustering strength on baryon mass and population age agrees with the trends as a 
function of luminosity, color, and spectral type found in the 2dFGRS and SDSS (Norberg et al. 
2001, 2002b; Zehavi et al. 2002). The predicted value of the hierarchical ratio S^f agrees with mea- 
surements from angular clustering catalogs (Gaztahaga &; Frieman 1994; Gaztahaga 2002; Szapudi 
et al. 2002). The value and scale-independence of the bias inferred from the ratio of galaxy-galaxy 
and galaxy-mass correlation functions agrees with recent observational estimates (Hoekstra, Yee, 
& Gladders 2001). The dependence of the galaxy-mass correlation function on population age 
parallels the dependence on galaxy type found by McKay et al. (2001), and the aperture mass 
AM(260/i^^kpc) is approximately proportional to galaxy baryon mass, as McKay et al. (2001) also 
find. 

There are three quantitative discrepancies between our predictions and existing measurements, 
two fairly subtle and one more substantial. First, the power law slope of ^(r) for the L*-galaxy 
sample (the 500 most massive galaxies in the box at z = 0) is slightly too steep, 7 = 2.00 it 0.08 
compared to observed values of 1.7 — 1.8, and there are hints of a departure from a power law 
at r < l/i~^Mpc. Second, the correlation length of this sample, rg = 4.5 =b OAh^^Mpc, is low 
compared to the value vq = 6.3 ± 0.8/i~^Mpc measured for r-selected galaxies with L fs in the 
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SDSS (Zehavi et al. 2002), though it is consistent with the value ro = 4.9 ± 0.3^~^Mpc found for 

6j-selected galaxies in the 2dFGRS (Norbcrg et al. 2001). Because of the still-limited size of 
our simulation volume, these discrepancies are of marginal statistical significance, and the N-body 
tests discussed in §3.1 imply that the low correlation length, at least, is a consequence of the lower 
than average fluctuation amplitude in the particular realization of ACDM initial conditions used 
here. 

The more serious quantitative discrepancy is the ratio of aperture mass AM(260^~^kpc) to 
galaxy baryon mass Mi,. For Mf, ~ (0.5 — 1) x 1O^^M0, the simulation prediction agrees with the 
result derived by combining McKay et al.'s (2001) values of AM/L with Kauffmann et al.'s (2002) 
stellar mass-to-light ratios, but at higher Mi, the predicted ratios are lower than the observationally 
inferred values by factors of ~ 1.5 — 2, implying that the galaxy baryon masses in the simulation 
are too high (Figure 14). As discussed in §4, we believe that this discrepancy is mostly numerical 
in origin, since our internal resolution tests (Fardal et al., in preparation) and comparisons between 
different SPH implementations (Springel &; Hernquist 2002) suggest that galaxy baryon masses 
are indeed overestimated by about this factor in this mass range, relative to a simulation with 
higher resolution or more accurate treatment of the interface between hot and cold gas phases. 
A comparison between our best guess at resolution-corrected results and the McKay et al. (2001) 
data points yields fairly good agreement in the average values of AM {26Qh~^kpc) / Mi,, which leaves 
little room for additional astrophysical processes, such as more aggressive feedback, to substantially 
suppress gas cooling and star formation in this galaxy mass range. Consistent with this conclusion, 
Yang et al. (2002) find that the N-body -|- semi-analytic models of Kauffman et al. (1999a), which 
do include much stronger stellar feedback, predict AM(260/i^^kpc)/L ratios that are too low (by 
a factor ~ 2) relative to those of McKay et al. (2001). As emphasized by Guzik & Seljak (2002) 
and Seljak (2002a, b), the constraints from galaxy-galaxy lensing measurements complement those 
from the galaxy luminosity function and the TuUy-Fisher and fundamental plane relations because 
they suffer different systematic uncertainties and have different sensitivity to astrophysical and 
cosmological parameters. For example, lowering while keeping other parameters (including the 
linear power spectrum) fixed would tend to lower AM{r)/L while having relatively little impact 
on the luminosity function (though galaxy formation, and hence galaxy luminosities, would still be 
affected by the different timescales and baryon-to-dark matter ratios of the lower Q.m model) . The 
weak lensing constraints will become considerably more powerful as the precision and detail of the 
measurements improves. 

Where we have been able to compare, our results generally agree well with those of other 
hydrodynamic simulations (Pearce et al. 1999, 2001; Cen k, Ostriker 2000; Yoshikawa et al. 2001), 
high resolution N-body simulations that identify galaxies with sub-halos (Colin et al. 1999; Kravtsov 
& Klypin 1999), and combined N-body -|- semi-analytic models (Kauffmann, Nusser, &: Stcinmetz 
1997; Governato ct al. 1998; Kauffman ct al. 1999a,b; Benson et al. 2000a,b, 2001). All of these 
methods make qualitatively similar predictions for the evolution of the galaxy correlation function, 
the bias between galaxies and mass at the present day, the dependence of that bias on galaxy 
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type, and the relative amplitude of dark matter and galaxy pairwise dispersions. This agreement 

suggests that these aspects of galaxy clustering and bias arise from robust aspects of the physics 
of galaxy formation that all of these methods treat correctly, or at least similarly. Our conclusions 
about and galaxy-mass correlations are for the most part new, but we do not expect them to 
be fundamentally different from the predictions of other methods. 

Returning to the main goals of the paper, we find that the combination of a ACDM cosmology 
with standard ideas about galaxy formation is, on the whole, remarkably successful at reproducing 
observed galaxy clustering and galaxy-galaxy lensing measurements. In terms of physical inter- 
pretation, the simulation analysis offers a number of insights. It shows that the detailed relation 
between the galaxy and dark halo populations can account for the difference between the predicted 
correlation function of dark matter and the observed power law form of the galaxy correlation 
function, and between the predicted pairwise velocity dispersion of the dark matter and the ob- 
served, lower dispersion of galaxies. It shows that the observed dependence of galaxy clustering on 
galaxy properties emerges naturally from the dependence of galaxy mass and population age on 
environment basically, older and more massive galaxies form in regions that collapse early and 
are today biased with respect to the overall mass distribution. It shows that values oi ^ 1 and 

~ 1 can emerge even when galaxies do not trace mass in detail. The quantitative discrepancies 
discussed above may also prove instructive, as the observations improve and the uncertainties in 
the simulations themselves are more thoroughly understood. 

Our analysis makes a number of predictions that can be tested with future data, or with future 
analyses of existing data. Our quantitative comparison to the observed type dependence of galaxy 
clustering has been limited by our finite simulation volume, which prevents us from predicting the 
clustering of rare classes of galaxies. However, it would be straightforward to test the quantitative 

predictions presented in, for example, Figure 7 by taking an observed sample above a luminosity 
threshold that yields the same space density as our complete sample and dividing it in two based 
on luminosity or color (or, better still, on the basis of stellar mass and population age, using 
techniques like those of Kauffman et al. 2002). The predicted evolution of the mean pairwise 
velocity and pairwise dispersion (Figure 8) can be tested with upcoming large redshift surveys like 
the DEEP and VIRMOS-VLT programs. The systematic variations of S^f with galaxy luminosity 
and age (Figure 10) can be tested with the 2dFGRS and SDSS. The most informative comparisons 
will probably come from improved measurements of galaxy-galaxy lensing and cosmic shear, which 
will allow detailed tests of the trends predicted in Figures 12 and 13. 

On all of these fronts, observational efforts are advancing at a staggering pace. Comparisons 
between more comprehensive data and improved theoretical predictions over the next few years 
should tell us whether our understanding of dark matter clustering and the physics of galaxy 
formation is indeed complete, at least in terms of features that have major quantitative impact, or 
whether there are still important ingredients missing from our theoretical recipe. 
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